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Introduction 

present inven««n ...es to a .ata p^ssor an. In pa«=u,a, « a data processor o, 
Reduced InstrucOon Set Computer (RISC) type data processor. 

.3 «,e com^onal re,ulre.ents o, data processing ■--;-^^^::Z 
passers have co,«spondlng„ tended to ^e^^sz- 
p^ssors are 16-t.. 32-b» and 6«l. P-ocessors ,.e. ^"^^^ 

the complexity and of the computations. 

.3en».^enada.ap.cessorlst>e,n.des.n.,.^^^^^^ 
various computational and other requirements ot the O^^J ^ ,„ 
^en the des-.ner. or programmer species '^'^^^^^"'r^Z numher o, 
Udde this tasK, spec-^ng the number o, bits o. datapa^ -^^ ^^ ' ^.^ 
Ulsters, memo^ and omer comp^«ona, -^"'-^^^^^^td aTsslble memory and 
contain at least a conflgurable logic unit, a plural*y 

tt« vartous datapaths be^n the components. Ha.n^ ^Zrl>L v.,1 then be 
tt«n Choose some processor and »lll then spec^ ^^'J™^;^^^^^ ,e or 

^^ledlnsll^n. --'"-:':^::,TZZ1ZZ... ^ P~=essor. 
sne has to maKe a choice J f ^^.^ a 3T-blt datapath 

5 suppose, for example, the requrrement .s aotua«y . ^ 
„,dm. 10 registers and a cert^n memon, and '^ TZ^so. and use K. or a 64.b.t 
as to ^ther he or she «l,l choose ^^^^^^^^^ than a a4..>l, data 
data processor. ^ ^ ^^^^^ it H sZr,^* more and the ^Ip 
p^ssor, but almost certainly the latter « ~ p„bably of the o«ier of 

,0 embodying the processor ^1. also be substant^lV ^^J^ ^^^ J^^^,, opacity 
100%. .he only processor that the des,gr«r c^ Jt^^^^^ ^^^^ 

of registers, then the chip being manufacU-red v«ll also have con 
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Further problems arise with the increase in datapath width in that the registers within the 
processors have also increased to have matching widths and many data being processed 
will be smaller than the datapath widths and thus large registers are used to store words in 
a wasteful manner. At the same time it is appreciated that the greater the number of 
5 dataprocessing registers available, then the more data can be stored in registers with fewer 
reads from or writes to cache or main memory. The disadvantage of providing a larger 
number of registers is the complexity and costs increase and as mentioned already 
increasing the size of the registers enabling them to store or manipulate a larger amount of 
data has the resultant disadvantages of cost, increased complexity and physical size. 

10 

RISC pipelining architecture has in general produced an increase in speed of processing to 
one command per processor system clock cycle. One particular model, known as the 
Harvard model, is used in such processors and has in many instances replaced the 
previously used von Neumann model. In the Harvard model the storage areas are 
15 separated and accessed by using different access routes. In both of these cases" 
processing and result sequencing of the command flow is carried out. 

Generally it has been realised that what is required is a processor that could be effectively 

infinitely configurable. What is needed is a generic processor. It will be appreciated that 
20 practically not every component of the processor needs to be infinitely variable. Various 

attempts have been made to do this, but heretofore have been relatively unsuccessful. 

Essentially what is required is a processor that can be specified exactly down to all the 

various components, whether they be the datapath width, memory, number of registers. 

size of registers and so on so that a designer can specify exactly the size and configuration 
25 of chip required to carry out the particular processing tasks. Thus, what is required is a 

processor with no redundant components either in number or size. . 

For example. U.S. Patent Specification No. 6,061,367 (Siemens) discloses a processor 
having a pipeline architecture and a configurable logic unit This processor includes as well 
30 as the configurable logic unit, an instnjction memory, a decoder unit, an interface device, a 
programmable structure buffer, an integer/address instruction buffer and a multiplex- 
controlled s-paradigm unit linking contents of an integer register file to a functional unit with 
programmable stmctures and having a large number of data links connected by 
multiplexers. The s-paradigm unit has a programmable hardware structure for dynamic 
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reconfiguration/programming while the program is running. The functional unit has a 
plurality of arithmetic units for arithmetic and/or logic linking of two operands on two input 
buses to produce a result on an output bus. a plurality of compare units having two input 
buses and one output bit. a plurality of multiplexers having a plurality of input buses and 
one or two output buses and being provided between the arithmetic units, the compare 
units and the register file, and a plurality of demultiplexers having one input bit and a 
plurality of output bits. A method is also provided for high-speed calculation with pipelining. 

Various other attempts have been made to provide improved constmctions of processors, 
such as, for example, those produced by the company Arm Limited. Typical examples of 
their processors are described in various U.S. Patent Specifications. For example, U.S. 
Patent Specification No. 5,969,975 (Arm) attempts to overcome tiie disadvantages of the 
complexity and increase in number of registers by providing an arithmetic logic unit to 
receive input operands from M X-bit registers to produce output datawords stored witiiin N 
Y-bit registers, where M/N = 3. 8<Y-X<16 and 3X=2Y. It is suggested that this 
arrangement is particularly suitable for digital signal processing and in situations where 
each input operand is used a plurality of times before a new input operand is loaded in its 
place in a register. 

U.S. Patent Specification No. 5,881.259 (Arm) is directed to accessing a memory having a 
plurality of memory locations for storing data values and in particular to a data processor 
tiiat prevents memory access. 

U.S. Patent Specification No. 6,021,476 (Arm) again is directed towards the accessing of 
memory in data processors. 

U.S. Patent Specification No. 5.961,633 (Arm) provides a data processor in which 
successive data processing instnjctions are again executed in a pipeline architecture. This 
processor contains conditional control means for preventing complete execution of a 
current instmction if either the memory detects that a memory access initiated by a 
preceding instruction is invalid, or if in some way it detects that the current instructions 
should not be executed. 
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U.S. Patent Spedfication No. 5,132.898 (Mitsubishi Denki Kabushiki Kaisha) describes 
another type of processor for carrying out operations between operands having different bit 
lengths of data and it illustrates very clearly the problems involved in the manipulation of 
such data. 

5 

While various attempts have been made, as mentioned already, to provide configurable 
processors, a considerable amount of the activity involved has been in improving the 
operation of processors generally and in improving their architecture without in fact tackling 
the major problem which is that what the user wants is a processor directed entirely 
10 towards the task in hand i.e. to allow the programmer or designer produce a processor, 
which processor will have a specification ideally matched to the processing requirements. 
Once this has been done then a considerable amount of the problems in relation to actual 
processing operations, etc. become less relevant. 

15 Statements of Invention 

According to the invention there is provided a processor having a number of components 
including at least a configurable arithmetic and logic unit, a plurality of registers, memory 
access, and datapaths between the components, characterised in that: 

20 

the datapath width is of variable bit size namely n bits; 
the number of the components are selectable; 
25 where appropriate the components are of n bit size; and 

each component is configured to handle data having one of two sizes 

< n or > n. 

30 

While the number of components is arbitrarily chosen, this is largely done for optimisation 
of the processor, but the processor essentially could have a components, where a was any 
number. It is often found in practice for example that producing 32 registers in the actual 
design of processor is an adequate number of registers for some particular uses. There is 
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however no reason why additional or less registers could not be produced and prepared. 
Many of the components will have to be of the n bit size to match the datapath width, but 
other of the components need not. For example, it is envisaged that the registers can be 
specified to any bit size, thus overcoming the problems as mentioned already in relation to 
5 register sizes. It is however important to appreciate that any input can be greater than or 
less than the datapath width size. Particularly this may be the case with memory where 
memory sizes will be larger than the datapath width. Further it may be that under normal 
operating conditions, the processor may be required to process data of 73 bits in length. 
The situation may arise where the processor is required to handle data of. for example. 150 
10 bits in length in rare situations. In this scenario, instead of designing a processor with a 
datapath width of 150 bits, the designer could design an optimal processor having a 
datapath width of 73 bits and program the processor to be able to handle the 150-bit piece 
of data as that situation arises. This will help to avoid redundancy under normal operating 
conditions. 

15 

Ideally such a processor comprises: 

means to select the number and size of each component; 

20 means to select the datapath width; 

means to configure the components for that datapath width; and 

means to compare the width of a data input to the selected datapath width that has 
25 been chosen for the component. 

By having such a processor, once the designer has specified the requirements, it is 
possible for the designer then to simply take the processor according to the present 
invention and input the various data. Then having inputted the various data requirements, 
30 such as. for example, in a database or other document, the processor can be used to 
effectively provide the processor and make it in silicon. What has been designed is a 
processor that will allow the developer to mould it to the need at hand. 

It will be appreciated therefore to a certain extent what is being provided according to the 
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present invention is not so much a processor, but in fact a template to allow a processor to 
be produced, in the sense that there will never be produced a processor of n bits wide. 
What will be produced for example is a processor with a datapath width of 37 with for 
example 15 registers, a configurable arithmetic and logic unit containing the logic required 
5 and memory access. This will then be realised in silicon, which will also mean that it will be 
as quick as using a standard 64-bit processor and only marginally more bulky and costly 
than a 32-bit processor. If fewer registers were a requirement, it might be less costly and 
bulky than an off-the-shelf 32-bit processor. 

10 In one embodiment of the invention when the immediate data of an instruction is limited in 
size to a preset number of bits and this number is less than n the immediate data is 
expanded to n bits wide. However, when the immediate data of each instaiction is greater 
than n. then the immediate data has to be truncated. 

Ideally the processor has special purpose registers and then general purpose registers. 
15 The general registers are dependent on the bit size of the data being handled and will be of 
size n bits, but not all of the special registers need to be of size n bits. 

It is envisaged that the general registers may be mounted extemal of the processor and the 
processor according to the invention is so-configured and thus all that a designer requires 
20 is to specify those registers to be held extemal. Also, most of the special registers can be 
mounted extemally. 

In a particular embodiment of the invention the registers are configured to allow their 
content to be written to memory extemal of the processor. In this way in certain situations 
25 the special registers can have two functions, which further reduces the size of the 
processor. They will act as general registers when required and will still be able to act as 
special registers. 

In some instances, all the general registers will indeed be n bits wide. 

30 

It is to be appreciated that data items of sizes other than n can be passed into the datapath 
of width n. In the processor, means are provided for extending or tmncating a data item of 
size X so that it matches the width n of the datapath. In the case of truncation, that is where 
X is greater than n, the data item of size x is truncated the size n with the most significant 
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end being discarded. If the data item of size x is less then n, the data item needs to be 
extended, how this is to be extended depends on two situations. The first situation is where 
the sign of the data item is to be maintained, here the (x-l)*^ bit Is replicated into bit x 
through to the (n-1)*^ bit. basically padding out the data item so that it fits the datapath 
5 width. The second situation is where no sign extension is required. In this situation, the 
data item of size x is padded out with zeros in the same range of bit locations as with the 
signed situation, until it is of width n. The only other case is where x is equal to n. Here 
there is no extension or truncation required so the data item of size x is passed straight into 
the datapath without any alterations. 

10 

Means are provided in the processor to perform logical operations on different halves of 
operands within the processor. Two different types of these half operand logical 
instnjctions are available. The X type operations swap the upper and lower halfwords of 
the first source operand and then perform the bitwise logical operation specified between 

15 this swapped operand and the second operand. The second type, S type operations, 
perform the bitwise logical operation specified on the two source operands, the upper and 
lower halfwords of the result are then swapped before it is passed on through the processor 
pipeline. The bitwise logical instructions that these type of operations involve are AND. 
NAND. OR, NOR, XOR and XNOR resulting in ANDS, NANDS. ANDX, NANDX. ORS. 

20 NORS. ORX. NORX, XORS, XNORS. XORX. XNORX and the immediate instruction 
equivalent versions. 

The processor, according to the present invention is designed and arranged so that 
separate functions to perfomri special logic operations can be added as separate units. 

25 These units will have been developed separate from the processor. However, the 
processor provides a single interface structure that presents common signals to all of these 
separate units thus enabling any one or any number of units to be added. This single 
interface is fixed providing 3 outputs that contain an operation code identifier (aluOp) and 
two operands (aluSI and aluS2) to perform the selected operation on. Also provided for 

30 are two inputs, one containing the result of the selected operation and the other a signal to 
indicate when the result is valid, \A^thin the processor itself, as these separate functions 
are added, so the ability of the processor to determine that these functions are to be used is 
developed, thus, when the processor is instnjcted to perform these separate operations, it 
will do so. These separate units can execute in one or more dock cycles (nnulticycle 
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Operation) and are integrated into the control logic of the microprocessor to the extend that 
stall control and data forwarding is performed identically to the way it is perfomied for the 
built in units. 

5 The processor according to the present invention can be so-arranged that both sets of 
registers can be shared between various processors. Thus, for example, in certain 
situations when more than one processor would be required in a particular application in 
the sense that while the designer or programmer might require two or more processors, 
that in the specifying of those processors it would be possible to use the same registers for 
1 0 both processors. 

It is envisaged that the processor according to the present invention will be embodied in a 
computer disk or the like storage medium and can be simply downloaded by an operator, 
the various parameters inputted, the processor configured and then downloaded for 
15 subsequent manufacture in silicon or the like material. 

Further the invention provides a method of designing a processor comprising the steps of: 

preparing an outline processor in general architecture having a series of 
20 components described by blocks or the like interconnected by various datapaths 

having at least a configurable arithmetic and logic unit, a plurality of registers, 
memory access, and such other units and components as are required for a 
processor of the type being designed and then defining the datapath width of 
variable bit size namely n bits; 

25 

choosing an artDitrary number of components greater than that which would ever be 
required such as, for example, 64 registers; or alternatively 

choosing components where a is any number that could be chosen; defining the 
30 components size as n bit size; and 

programming each component to handle data having one of two sizes, namely <n 
or > n. 
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In this way a general processor is designed and then subsequently when it is required to 
produce a processor from this general design the number of components, the datapath 
width size and so on are chosen and they are entered into a database, which database will 
allow a particular design of processor to be produced. 

5 

Detailed Description of the Invention 

The invention will be more clearly understood from the following description of an 
embodiment thereof given by way of example only with reference to the accompanying 
10 drawings, in which: 

Fig. 1 is a block diagram of a processor according to the invention and the 
external interfacing; 

15 Fig. 2 illustrates the basic processor pipeline; 

Fig. 3 illustrates the basic processor pipeline with control signals; 

Fig. 4 is a block diagram illustrating a bitwise logic X instruction; 

20 

Fig. 5 is a block diagram illustrating a bitwise logic S instruction; 
Fig. 6 illustrates the processor pipeline in more detail; 
25 Fig. 7 is a block diagram of the processor information; 

Fig. 8 is a flow diagram illustrating the data memory sign extend unit; 
Fig. 9 is an overall block diagram of the register unit; 

30 

Fig. 10 is a block diagram of the general purpose registers; and 



Fig. 11 is a block diagram of the register multiplexers (muxes). 
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Referring now to Fig. 1 there is illustrated in block diagrammatic form an outline of the 
processor according to the invention and the external interfacing to it. All of the external 
interfacing has various signals to and from the processor. The processor is identified by 
the reference numeral 1 and the principal components Illustrated are instmction 
5 decoding 2 which in turn feed an arithmetic logic unit (ALU) 4 through datapaths 5 of n 
bits wide. Further datapaths 5 are also illustrated as is a data memory control 6 fed from 
the arithmetic logic unit 4. The data memory control 6 also feeds the general purpose 
and special registers which together with the instruction decoding 2 also feed the 
arithmetic logic unit 4 through a mux 7. Signal descriptions for Fig. 1 are listed below 
10 and are elaborated on somewhat later. 

sysClk 

This is the system wide clock provided to the processor. All pipelining and registering 
within the processor js done on this clock. 

15 

sysReset 

This is the reset signal provided by the system. It is active high . 
lmAddr[m'1 :0] 

20 This is the instruction memory address bus. It can be synchronous or asynchronous. It 
is in byte address sizes but all values that appear on it are word addresses. On a reset 
this bus goes to zero. M is the configured program memory address width. 

imData[p'1 :0] 

25 This is the data from the instruction memory i.e. it is the instruction addressed by the 
instruction memory address bus. P is the configured size of the instruction data. 

imRdy 

This signal indicates when valid data is available from the instruction memory. If the 
30 instruction memory takes more than a clock cycle to produce valid data from when it is 
addressed, this signal must be pulled low until valid data is available. 
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dmAddriq-1 :0] 

During accesses to data memory, the address of the data location appears on this bus. It 
is a registered output. The addresses that appear on this bus are byte addresses. 

5 dmDataln[n'1:0] 

If a Load from memory instruction occurs, the data from the data memory location, 
addressed by dmAddr[q-1 :0], is passed to the processor on this bus. 

10 dmDataOut[n-1:0] 

If a Store to memory instruction occurs, the data to be written to the data memory 
location, addressed by dmAddrfq-l :0l appears on this bus. 

dmCS 

15 When this signal is HIGH it indicates that an access to memory is occurring 
dmRW 

This signal indicates to the memory whether a load or store is happening. If it is HIGH 
this indicates a store to memory and if it is LOW a load from memory is happening. 

20 

dmSiz[1:0] 

This output signal is used by the processor to indicate to the data memory when word, 
halfword or byte transfers are required. bOO indicates that the transfer is a byte, bOI 
indicates that the transfer is a halfword and b10 indicates that the transfer is a word. 
25 These values are valid for both loads and stores. 

dmRdy 

This input signal indicates when valid data is available from the data memory. If the data 
memory takes more than a clock cycle to produce valid data from when it is addressed. 
30 this signal must be pulled low until valid data is available. 

extint 

This input signal is the request from an external device to interrupt the processor. It must 
be held high for at least 1 sysClk clock cycle. 
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extlntAck 

When the processor receives the external interrupt and starts to service the interrupt, 
this signal is set high for 1 sysClk clock cycle to acknowledge the interrupting source that 
5 it has received the intenrupt. 

As explained the architecture of the processor is based around the Harvard architecture 
model. This model includes the non-sharing of instruction and data memory space which 
lends itself to a very low cycle per instruction count as there is no contention for memory. 
1.0 Potentially if there is zero wait memory, such as asynchronous SRAM, the processor will 
not have to stall and wait for any memory access to be completed. Essentially the 
processor according to the present invention is shown in five stages. This is illustrated 
in Fig. 2 

15 The pipelining technique allows the overlapped execution of multiple instructions. The 
pipeline in the present processor is divided into five stages. All of the stages use the 
same clock cycle so an instruction is completed every clock cycle and the duration of an 
instruction is five clock cycles. It will be appreciated therefore that this is a particularly 
suitable form of processor as the through-put is increased by a factor of five, under ideal 

20 conditions. It is important to appreciate that all the stages are active on every clock 
cycle. 

Referring now to Fig. 2 the elements of each stage of the pipeline is described in somewhat 
more detail with the memory connected thereto. The processor is again indicated by the 

25 reference numeral 1 and the stages are divided into five stages, namely a Fetch stage 10, 
a Decode stage 20 , an Execution stage 30, a Load and Storage stage 40 and a Write 
Back stage 50. Because the stages are identified by different reference numerals, the 
^ components previously identified by a reference numeral now may have a different 
reference numeral attached thereto. The Fetch stage 10 implements the loading of the 

30 next instruction to be executed. A program counter (PC) keeps track of the instruction 
number to be executed. The Fetch stage 1 0 includes an instruction memory 1 1 and 
address buses connected to this instruction memory 1 1 and a multiplexer (mux) 12 to 
select the next PC. The memory is addressed by the actual value of PC and the content of 
that position is registered and sent to the decode stage 20. The multiplexer 12 selecting 
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the next PC is dependent on the instnjction being decoded at the same clock cycle in the 
decode stage. It determines whether to choose from the PC +4 or the target address for 
branch or jump instructions. It is passed out as the instruction memory address and the 
data returned. In the Decode stage 20 after the instruction has been passed from 
5 memory the Decode stage 20 decodes the instruction to determine the operation to be 
perfonmed in operands that are selected by the instruction. These operands are from 
registered address by the instruction, or a value provided by the instruction. This is 
where the whole control of the whole pipeline occurs. It takes the instruction from the 
Fetch stage 10 and decodes it in order to set the signals which will control its execution. 

10 Part of the information present in the instruction being decoded are the addresses of the 
registers involved in some operations. There is thus provided a decoder 21, general 
purpose and special purpose registers 22. a sign extend unit 23 and a multiplexer 24 for 
selecting the next PC. Part of the information present in the instruction which is being 
decoded in the decoder 21 are the addresses of the registers 22, thus they address the 

15 source operands of the general purpose and special registers 22 and their contents are 
registered to become the inputs for the Execution stage 30. In the case of an immediate 
operation, the 16-bit immediate value, which is the usual value coming from the 
instruction is either sign extended or padded with zeroes in the sign extender 23. This 
sign extend unit 23 is a dedicated sign extend unit that will be described in some more 

20 detail below. Generally speaking the instruction data will be in 32-bits with 16-bit 

immediate value. The processor can be configured for higher inputs, but they are not 
generally required and thus in the description of the processor there is this limitation. 

When decoding a branch instmction or a jump, the value of next PC is appropriately 
25 changed. Decisions of whether to change or not to change the value of the PC and the 
calculation of the target address are done in the Decode stage 20 by means of control 
logic and an additional adder. In the case of TRAP, RET or RFE instructions the PC is 
also changed from the normal flow to a predetermined value. 

30 The Execution stage 30 is where the actual implementation of the operation decoded in 
the Decode stage 20 is performed. This is where an ALU unit 31 is illustrated which 
ALU unit 31 is in fact the ALU 4 already identified in Fig. 1. In the Execution stage 30 
the ALU operation indicated in the instructions and registers is performed and delivered 
to the next stage of the pipeline. It calculates the address for the data memory access in 
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the Load/Store stage 40 which will be performed in the next clock cycle in the 
Load/Store stage 40. The source operand could be either a register or an immediate. 
Thus, there is a multiplexer 32 provided to decided between them. 

5 The next stage is the Load/Store stage that also could be called the memory stage 40. 
The data memory address 41 and data buses, as well as the corresponding control 
signals in turn has a further multiplexer 42 to all the either the memory data or ALU 
result to be registered in what is effectively the last stage, which is the Write Back stage 
50. It passes data to be written to the general purpose registers or special purpose 
10 registers 22 and the control signal to do it. 

The above is a brief outline of the architecture. It does not describe it in great detail and 
indeed most of the architecture can be said to be essentially conventional. 

15 However, the processor according to the present invention is extensively configurable 
and parameterizable. The datapath width has been set at n bits and the number of 
registers and the size of instruction and data memories accessible are configurable. 

The data length of the datapath elements and almost all the registers of the processor 
20 can be configured to any width from 1 bit to n bits, namely a word length of n bits for the 
processor. The processor according to the invention also uses data of two other sizes, 
namely halfwords and bytes. Halfwords are half the width of the word length, needless 
to say if the word length is an odd length, namely n is not an even number, the half word 
is modulus of half the width of the word length. Sometimes in the following discussion 
25 the term byte, which is 8-bits, is used, but will be understood by those skilled in the art. 

According to the invention the width and amount of registers within the processor may 
be configured. Again this is described in more detail. For ease of design and use, it is 
normal to pick a maximum number of registers according to the invention, such as. for 
30 example. 64 registers and to design the processor for 64 registers. Thus, generally 
speaking the roisters, except for some are of n bits wide and the actual number of 
registers is arbitrarily chosen in due course as will be explained later. The important 
point to appreciate is that all these registers are provided which may be configured as 
required. 
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In the particular processor according to the present invention the instmction and data 
memories are physically outside of the processor, however, the amount of memory 
accessible is defined by the processor. Both the instmction memory address. imAddn 
5 and the data memory address. dmAddr, are generated by the processor and the width of 
these busses can be set to match the size of memory needed (see Fig. 1). 

The data width of these memories can also be configured with the data mismory width 
matching the width of the processor.datapath width, namely n. 

10 

In relation to the instruction memory while this instruction memory width is determined by 
the width of the instructions, the processor according to the present invention is so- 
arranged that the instruction memory width can be variable. However, at the present 
moment because it has been found that an instruction width of 32-bits is sufficient for the 
15 present design so it is canied out with the instruction width fixed at 32-bits wide. 

Obviously this has the ability to be changed if the instruction memory use some form of 
compression or alternatively could be extended. 

Various other configurations of the processor are included, for example, interrupts can 
20 be enabled or disabled, the number of interrupts required can be configured, special 

hardware functions can be added as special ALU operations. Further instructions which 
are derived from the instructions of the processor can also be added. Again, this is 
discussed in more detail below. 

25 Reference has already been made to the registers and they have been described as both 
general purpose registers and special registers. The general purpose registers (GPR) are 
the set of registers that can be read or written to by all instructions that access registers. 
Register RO always returns 0. The number of GPRs in the processor can be configured 
in this particular embodiment up to a maximum of 32 and. the width of the registers 

30 match the configured datapath width n. 

The special registers are a second set of registers in the processors. These registers 
can be read or written by Instructions that perform an operation where the two source 
operands are register values. The first four registers are used for controlling the 
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processors and these four registers are a reason register/link address register, 
exception address register and an interrupt register. It is possible to configure up to 32 
registers as witli the GPRs. The reason register explains the present state of the 
processor 4 bits are used and generally they are as listed below. 



n 


4 


3 


2 


1 


c 


Reserved 


1 


T 


0 


R 



where: 

10 R - Indicates that the processor has been powered up from a full hard system reset. 



0 - If the processor received an illegal instruction this bit will be set and the 

processor will start executing from the start address again. It also will have the 
effect of clearing the R bit so that it is indicated that the last reset was a soft reset 
15 as opposed to a hard system reset. If this bit has been set and there then is a 

hard reset, this bit wilt be cleared. 

T - If a trap instruction has been encountered, the processor will execute an 

exception service routine and while this routine is being executed, this bit is set. 
20 On exiting the exception service routine, this bit will be cleared. 

1 - This has the same operation as the T bit except it is set while an exception 

service routine is being executed as a result of an external interrupt. 

25 

The link address register contains the value of the return address when the code being 
executed jumps to another instruction and intends to retum back to the original section 
of code. An example of this is a procedure call. The width of this register has a minimum 
size of the instmction memory address width a if the datapath width n is less than a 
30 however, if the datapath width is greater than the instruction memory address width a. 
this register takes on the size of the datapath n. 
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The exception address register is at register address 2 in the special registers set. This 
register contains the value of the return address when the code being executed jumps to 
another instruction and intends to return back to the original section of code. The 
5 instructions that cause it are JAL and JALR. If those instructions are not present in the 
code, this register can be used as Special Register otherwise the return address will be 
overwritten. 

The interrupt register is at register address 1 in the special registers set. This register is 
10 n bits wide with the bottom half of the register holding the enable bits and the top half 
containing the pending bits. 

This register and support logic controls the interrupt handling of the processor. As this 
register matches the datapath width and two bits of the register are used per interrupt, 
15 then the number of allowable interrupts is n/2. When an interrupt is received the 

pending bit is set. Then if the enable bit is set. the processor will automatically service 
the interrupt. 

Both the General Purpose Registers and the Special Registers can exist either inside 
20 the processor or Outside it, depending on the configuration required. In this 

implementation, the first four Special Registers are always inside the processor, the rest 
of the Special Registers can be either inside or outside it. All the GPR's can be either 
inside or outside the processor. 

25 In the present implementation of the processor, there are four instruction formats . As all 
the opcodes have not been used, many more additional instructions may be introduced. 

In the present processor the instructions have initially been implemented at 32-bit wide. 
However, this has been set as a parameter of bit wide p, which can be changed if a 
30 reduction or expansion in instruction memory width is implemented and some form of 
Fetch stage decompression is used to expand the instruction to its intended size. 
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The first format is an l-type (immediate) instructions which manipulates data provided by 
a 16-bit signed or unsigned immediate field in the instmction. These immediate 
instructions break down as follows: 



Immediate ALU operations where the immediate is used jas an operand for the 
ALU and the result is written back to a register. 

Conditional branch instructions where the immediate is added as an offset to the 
Program Counter to transfer control of the processor to a different point in the 
source code. 

Load from Memory and Store to Memory instructions use the immediate data as 
the offset to a register value to generate the memory address to be accessed. 



31 26 


25 




21 


20 16 


15 
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Opcode 


Rd 


Rsl 


Immediate 


6 




5 




5 
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The immediate data of an instruction or the data from memory may be in either signed or 
unsigned binary format. If the data is in signed format, then it is imperative that the sign 
be maintained if the data should go through any expansion. The processor handles this 
by firstly determining whether or not a piece of data is in signed or unsigned format. If 
the data is in unsigned format, then the processor will populate the vacant bits of the 
datapath with zeroes. If the data should happen to be in signed format, then the vacant 
bit positions of the datapath up to the (n-l)**' bit are populated with the MSB of the data. 
Generally speaking, these will be populated with ones should the data be negative 
signed binary, and zeroes should the data be positive signed binary. Sign expansion will 
be discussed in more detail below. 
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The second format of instruction is the R-type (register to register) instructions which 
perform pure ALU type operations on two operands provided by two source registers 
specified in the instruction. The result is always destined for a register. The operation to 
be performed is specified by the aluop field of the instruction. Access to the Special 
5 Register set from source code can only happen through R-type instructions except for 
Special Register 1 (Link Address Register). Special Registers are identified by 3 1-bit 
flags in the instruction and are shown and explained below. 



31 




26 


25 




21 


20 


16 


15 


11 


10 9 


8 


7 


6 


5 




0 


R-R 


Rd 


Rs1 


Rs2 


Unused 


f1 


f2 


f3 


aluop 




6 






5 




5 




5 




2 


1 


1 


1 




6 





10 



f1 =0 

f1 = 1 



=> rs1 is addressed in the General Purpose Registers; 
=> rs1 is addressed in the Special Registers; 



15 f2 = 0 
f2 = 1 



=> rs2 is addressed in the General Purpose Registers; 
=> rs2 is addressed in the Special Registers; 



f3 = 0 
20 f3 = 1 



=> rd is addressed in the General Purpose Registers; 
=> rd is addressed in the Special Registers. 



The third type of instruction is the J-type (jump) instructions which are the unconditional 
jumps in source code transfer control. There are 4 instructions grouped in this type, 
Jump, Jump Register, Jump And Link and Jump And Link Register. The two Jump And 
25 Link based instmctlons retain the next instruction address from the jump instruction so 
that program control can return to the point the jump was executed. This address is 
stored in the Link Address Register in the Special Register set. 



30 



This following is the make up of the Jump and Jump And Link instmctions. A 26-bit 
name is sign extended and added to the Program Counter to create the address of the 
next targeted instruction 
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26 25 



Opcode 



Name 
26 



The Jump Register and Jump And Link Register instructions are constructed as follows, 
where rs1 is the General Purpose register address whose contents is the address of the 
targeted instruction. 



31 26 


25 21 


20 


16 


15 
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Opcode 


Unused 


Rs1 


Unused 


6 


5 




5 




16 





25 



10 The fourth type of instruction is C-type (control) instruction which is used for processor . 
control type functions. They contain a simple opcode with ho register or immediate 
referenced. 

The HALT instruction will stall the EVE Processor pipeline and continued operation will 
15 not commence until an interrupt is received. 

The RET instruction transfers control back to the section of code jumped from by a JAL 
or JALR instructions. 

20 The TRAP instruction is a mechanism for allowing software to transfer from the main 
code to the Exception Service Routine. 

The RFE instruction retums control from the Exception Service Routine back to the main 
code after either a TRAP instruction or an interrupt has been serviced. 



31 26 25 

Opcode ] Unused 
6 26 
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In most cases namely R type instmctions the use of rs1 .rs2 and rd is very clear. For 
example, it could be 

add r5,r4,r3 => rd = r5; rsl = r4; rs2 = r3 

5 

However in the case of I type, namely Load and Store type instmctions this is not that 
clear and thus it is necessary to be aware that: 

For a STORE e.g. SW offset(RIO), R3 rd = R3, rs1 = R10. immediate = 

10 offset 

For a LOAD e.g. LW R3, offset(RIO) rd = R3. rsl = RIO, immediate = 

offset 

Also for Jumps and Branches based on register values( BEQZ, BNEZ, JR, JALR) the 
15 register is in rs1 not rd i.e. in bits 21 :16 of the instmction word which means rd (bits 
25:21) should be zero. 

There are three situations that will require the whole pipeline to stop, one of which, 
namely, the Halt Instruction, has already been discussed. The other two situations are 
20 when either the data or instruction memories are not ready. The signals dataRdy and 
instRdy respectively are asserted in order to indicate to the stall controller that a stall is 
to occur. They tell the processor if memory accesses have happened or are about to 
happen. If the memory access does not happen, they stall the processor. 

25 As mentioned already, it is envisaged that many more instructions may be introduced 
and the processor according to the invention is configured to adapt to such further 
instmctions as defined as no instruction has a fixed opcode. 

One set of instmctions implement a branch conditioned on the comparison between the 
selected byte, halfword or word specified in a register and the corresponding byte, 
30 halfword or word specified by the immediate (only for bytes) or in another register. A 
comparison unit that performs all compares can be fully parameterised allowing any 
datapath size comparisons. A sub-block, of the comparison unit, of two muxes and XOR 
gates are parameterised to accept data from 1 up to 8-bits. Depending then on the 
datapath size, any number of these sub-blocks can be instantiated to form the datapath 
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width. A final sub-block, which takes its input from the previous sub-block, which 
compares this input with zero, will be of datapath size n. Of course, byte comparisons 
are only allowed when the datapath size is bigger than or equal to 8. When performing a 
byte comparison, the rest of the sub-blocks will force the output of the XOR gates so that 
5 the bits not being tested will not affect the final comparison. These instructions are 
implemented in two different formats one for the byte immediate branches and the 
branch on register compare. 

These yield in 22 new branch instructions 

10 

Table 1 - Branch Instructions 



No. of 
Opcodes 


Mnemonic 


Instruction meaning 


4 


BEQBxl 


Branch if byte x equal to byte immediate 


4 


BNEBxl 


Branch if byte x not equal to byte immediate 


4 


BEQBx 


Branch if byte x equal to byte register 


4 


BNEBx 


Branch if byte x not equal to byte register 




BEQHU 


Branch if upper half equal to half register 




BNEHU 


Branch if upper half not equal to half register 




BEQHL 


Branch if lower half equal to half register 




BNEHL 


Branch if lower half riot equal to half register 




BEQW 


Branch if word equal to register 




BNEW 


Branch if word not equal to register 



The format of the BEQBxl and BNEBxl instructions is as follows: 
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When these instmctions are executed, the immediate byte in the instmction is compared 
to a byte in the data item stored in a register pointed to by Rs1. This register address 
field is only 3 bits in size, therefore, the byte can only be compared to the contents of 
5 one of the first 8 GPRs. If the comparison is TRUE, the 15 bit Target is added to the 
contents of the PC and used as the next address. 



The format of the remainder of the Branch on value compare instmctions is as follows: 



26 

31 


25 




21 


20 16 


15. 




0 


Opcode 


Rsl 


Rs2 


Target 


6 




5 




5 




16 





Here, the values contained with in the two registers, addressed by the instruction, are 
compared and if this compare is true, the Target is added to the current PC and used as 
the next address. 



15 ALU instnjctions such as the Adds and Shjfts can be implemented to use a cany set by 
the execution of the previous instruction to affect the carry. Although these 
are not fully specified, the capacity to implement them is available. 

For the Add operation in the ALU, the previous carry is added along with the two source 
20 data. 

For Shift operations, the previous cany is shift in to the end of the data being shifted and 
the bit falling of the end is stored as the next carry bit. 

25 The carry bit can also be used to branch on. In executing this instruction, the branch will 
be taken if the carry is Set or Clear depending on the type of test specified. 

As already mentioned, means are provided in the processor to perfomri logical bitwise 
operations, such as AND, OR and Exclusive OR on different halves of operands. These 
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operations can be performed as either l-Type Instructions or R-Type instructions and so 
adopt tfie same instruction formats. 

After the definition of the pipeline stages elements, the next step is defining the control of 
5 their functionality. Because the instructions decoded in the Decode stage are effectively 
executed in the Execution or Memory stages, some control signals have to be generated 
and adequately <Jelayed to make them effective at the right time. 

The solution adopted by the present invention architecture is sending through the 
10 pipeline the control signals along with the data, so they automatically will appear at the 
right clock cycle in the expected stage. The problems that arise using this configuration, 
like hazards and stalls, will be discussed below. 

Referring to Figs. 2 and 3 all the control signals are generated in the Decode stage 20 
15 depending on the instmction being decoded. They are sent through the pipeline in the 
case of being used in Execution 30, Memory 40 or Write Back 50 stages or directly used' 
in the Fetch 10 or Decode 20 stages without being registered. 

There is only one control signal used in the Fetch stage 10. It is the select signal for the 
20 PC mux. The decision is taken in the Decode stage 20 after deciding if the PC should be 
just incremented by four (the normal program flow) or be loaded with a different value. 

The Decode stage 20 not only generates the control signals for other stages but also 
generates control signals for itself. It is the case of the signal controlling the sign extend 
25 unit. The immediate value coming in the instruction is either sign extended or padded 
with zeroes, diepending on the operation being signed or unsigned. The select signal 
indicates which extension has to be done. 

The Execution stage 30 also requires control signals for the muxes choosing the proper 
30 source operands for the ALU 31 operation and the ALU 31 needs a signal indicating 
which operation has to perform on them. There are also four control signals-passing 
through this stage. They will be used in the following stages. 
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Two of the four control signals received by the Memory stage 40 are used on it. One 
enabling the data memory 41, thus indicating a load or a store instruction and the other 
with additional information to generate more control signals for the data memory 41. 
These two signals are processed in a sub-block in order to generate the RW and size 
5 signals to accordingly control the data memory operation. This block also generates a 
select signal for the data mux. The mux chooses the memory contents in case of 
performing a load instruction, otherwise the ALU 31 result is passed to the final stage. 
The other two control signals pass through this stage and will be used in the last one. 

10 Despite the Write Back stage 50 does not have hardware at all, it passes the data to be 
written, as well as the destination register address and the RW signal to the general 
purpose and special registers 22. 

The flow of any instruction being executed in the processor starts in the Fetch stage. 
1 5 where PC addresses instruction memory 1 1 and the instnjction is read from that 

position. In the next clock edge, it is registered to the Decode stage 20 where the main 
decisions are taken. As the result of those decisions, the proper control signals are 
generated and sent to allow its execution. 

20 In the case of an R-type instruction, the first action done in the Decode stage 20 is 
addressing the source registers 22. The content of these registers is registered out to 
^/ the Execution stage 30 along with the ALU 31 operation, the select signals for the 

source operand muxes, the destination register address and the write enable signal. The 
rest of the control signals are driven to default. The select signal for the PC mux will 
25 chose PC+4 due the program flow. will normally continue. 

In the Execution stage 30, the source operand muxes pass the corresponding source 
operands and the operation indicated by the ALU operation signal is performed on them. 
The result is registered to the Memory stage. This stage and the Write Back stage 50 
30 pass the ALU result along with the address and control signals to the registers 22 to be 
written. It is because there is no data memory access to perform. 



When performing an l-type instmction involving an ALU 31 operation, the situation is 
similar to the one described above, except that the second source operand is an 
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immediate. It is extended and registered in the Decode stage 20. In the Execution stage 
30 it is chosen as a second operand, instead of Rs2, by means of the select signal. If the 
instruction is a load or a store, Rsl and the immediate are added to form the data 
memory 41 address. In the case of a store, Rs2 holds the data to be stored, so it is also 
5 passed to the Memory stage 40 and there Is no destination register. 

The control signals indicating the type of load or store instruction (signed or unsigned, 
byte, halfword or word) sent from the Decode stage 20 are processed in the Memory 
stage 40 to produce the proper control signals for the memory. It also generates a select 
10 signal for the mux choosing between the memory data (in case of a load instruction) or 
the ALU 31 result 

The store instruction is finished in the Memory stage 40, because there is no data to 
write back to the registers 22. Nevertheless, the Write Back stage 50 sends the data, 
15 destination register address and write enable signals to the registers 22 as usual. In this 
case, the data sent is the ALU 31 result and the destination register is RO (not writable).^ 

When the l-type instruction is a Branch the actions taken are different. The register 22 
indicated in the instruction is addressed as usual, but its content is compared with zero 
20 to decide whether the branch should be taken or not. It is done in the Decode stage 20. 
Depending on that decision, next PC is selected adequately in the Fetch stage. The 
optional Brach Instmctions which performs the comparison between selected byte, 
halfword or word specified by the immediate or in another register, are also done in the 
Fetch stage 20. 

25 

The control signals sent through the pipeline are defaulted because any actions are 
required further on. The Execution stage 30 thus performs an addition on the register 22 
addressed and RO and the destination address is set to RO. It is equivalent to perform a 
NOP which is defined as: ADD RO.RO.RO. 

30 

In the case of a branch taken, a signal is setJn the Decode stage 20 to indicate that the 
next instmction has to be annulled. It is because that instnjction was fetched while 
decoding the branch instruction but it should not be executed. If the branch is not taken, 
the program flow normally continues. 
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To annul an instmction means that despite it has been fetched, it will not be executed. 
To do so, the Decode stage 20 sends to the pipeline a NOP, ignoring the contents of the 
instmction. 

5 

The J-type instructions change the value of next PC unconditionally. What is decided in 
the Decode stage 20 is which value of next PC has to be chosen and whether to store or 
not the value of actual PC in order to continue the normal program flow after returning 
from the jump routine. These instructions cause the next instruction to be annulled. 

10 

The J instruction includes an offset to be added to the actual PC to form the target 
address. That address is chosen by the select signal of the PC mux as the value for next 
PC. A NOP is sent to the pipeline because nothing has to be calculated in the Execution 
stage onwards. 

15 

JAL instruction does the same, except that actual PC is stored in the Link Address 
Register, When the RET instruction is found, the value stored in LAR is loaded into next 
PC to allow the program flow to continue. 

20 JR and JALR address a register, which content is directly loaded as next PC. Again, in 
the case of a JR a NOP is sent to the pipeline and in the case of a JALR, the value of 
actual PC is stored in LAR and when the RET instruction is found, the value stored in 
LAR is loaded back into next PC to allow the program flow to continue. 

25 The control transfer instructions accordingly change the value of PC. The instruction 
TRAP or an interrupt cause next PC to be loaded with a predetermined address and 
actual PC to be stored in Exception Address Register (EAR). The content of that 
address is either the first instruction of the Exception Service Routine (ESR) or an 
instruction to jump to it. RFE marks the end of the ESR and causes next PC to be 

30 loaded with the contents of EAR. 

RET does the same as RFE, but loading the content of LAR instead. The instruction 
HALT causes the whole pipeline to stall until an intemjpt is received. Every stage keeps 



pcT/iE 0 1 /oo oo: 

28 

doing the actions they were doing when the HALT Instruction came in. until the pipeline 
is released and the inputs of the stages are able to change. 

Fig. 4 illustrates the implementation of a bitwise logic X instruction to be carried out on 
5 two operands, A and B. each having an even number of bits. First of all. the upper and 
lower halves of data of the first operand are swapped before the bitwise logical operation 
is carried out producing the result. The bitwise logical operation block may represent 
any of AND. NAND. OR. NOR. XOR or XNOR operations. The instructions produced 
are ANDX. NANDX. ORX. NORX, XORX and XNORX. 

10 

For example, if we take two inputs of four bits each, input A being 1001 and input B 
being 001 1 . and we are to perform an ANDX operation on the data, then we would first 
of ail have to swap the data in the top half of input A with the data in the bottom half of 
input A, namely, bits 2 and 3 with bits 0 and 1. This would mean input A would become 
15 0110. The operands are then passed through an AND gate giving a result of 0010. 

Another similar type of instruction is one bitwise logic S instruction, as shown in Fig. 5. 
In this case of bitwise logic S instructions, the logic operation, i.e. AND, NAND, OR, i 
NOR. XOR. XNOR is performed before the data in the upper half of the result is 
20 swapped with the data in the lower half of the result. These instructions are denoted by 
ANDS. NANDS, ORS, NORS. XORS and XNORS. 

When the X and S operations are performed on data of uneven bit number, one bit of 
data is discarded and the remainder of the data is operated upon in the manner already 
25 described. It has been found convenient to discard the central bit of data in data of 
uneven bit. 

All of the above-mentioned functions may be achieved using extensive cross-wiring 
techniques. 



More detail with the addition of the extra hardware to prevent stalls and hazards, which 
includes the muxes in front of the registers and the forwards, the pipeline shown in Fig. 3 
becomes the one shown in Fig. 6. 
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Fig. 7 illustrates the top level break down of the core into 4 main areas. 

There are shown four essential areas namely the Instruction Unit 61, the register unit 62. 
the execution unit 63 and the data unit 64. 

5 

The instruction unit 61 is the section where all work is done with the instruction, control 
of fetching it from the instaiction memory, decoding it and setting control signals for the 
rest of the processors. 

10 The register unit 62 is separated from the Instruction Unit This is aimed towards 
synthesis, as there will be a large amount of actual registers implemented. It is 
addressed by the decoding of the instmction to present operands to the Execution Unit. 

The execution unit 63 is the implementation of the Execution stage of the EVE 
15 Processor pipeline. This is in Its own block as concentration can be put on it because of 
its importance and possibly its critical timing. 

. Finally, the Data Unit 64 is the remainder of the processor, which in effect does 
something with data, writes data to memory reads data from memory and then writes 
20 data back to a register of the processor. 

Data being written to or from memory may be of a different length to the datapath width. 
Often, the data will have to be extended to populate the entire datapath width. If the 
data is in signed format, this is particularly important as the sign of the data must be 
25 maintained. A signal is generated, dmSESel, to select which kind of sign extension has 
to be performed on the data coming from memory. It is asserted when loading a byte or 
a halfword. and also depends on the type of load being performed (signed or unsigned). 
Otherwise, data coming from memory goes through this sub-block without being 
changed. The encoding values are shown in Table 2. 



30 
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Table 2 ■ Encoding values for dmSESel signal 









Oxx 


PASSTHROUG 
H 


Pass word through 


100 


ZEXTBYTE 


Zero Extend byte 


101 


SEXTBYTE 


Sign Extend byte 


110 


ZEXTHALF 


Zero Extend halfword 


111 


SEXTHALF 


Sign Extend halfword 



The data memory Sign Extend Unit not only performs the sign extension indicated by 
dmSESel, but also places the conrect byte or halfword coming from data memory into the 
register. That operation depends on the signal endian, \A4iich indicates if the system is 
10 accessing data memory in little endian mode {endian = 1) or big endian mode {endian = 
0). The flow diagram for the block is shown in Fig. 8. where a 32 bit data path is 
assumed. 

Once the size (byte or halfword) is detennined, the signal endian is checked in order to 
15 pick the conrect part of the data and place it in the register. Then, the corresponding byte 
or halfword is either sign extended or zero extended. If the size indicates word, the 
content of the memory position addressed is just placed in the register, as it passes 
through this module without suffering any transfomnations. 

20 

The Register unit 62 is built up of 3 sections. Fig. 9 shows an overview of the Register 
Unit 62 and its main components. The Register Files themselves are separated into two 
banks, the General Purpose Registers and the Special Registers. All registers are 
synchronous to the system clock sysClk, 



25 
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Within the General Purpose Registers (GPRs) there are 32 addressable registers. There 
are in fact only 31 registers, each of these being 32-bit wide, with register 0 not being 
made up of actual registers but is a constant 32-bit zero. A block diagram of the GPRs is 
shown in Fig. 10. which breaks down into 3 sections. 

By means of the parameterisation of the EVE architecture, the GPR block can be 
outside of the processor. In that case the inputs to the block will be driven to that 
external block and its outputs will be connected to the corresponding inputs in the 
Register Muxes block to be chosen as source operands if selected. 

The signal dataBack returns to the registers in what is the Write Back stage of the 
processor pipeline. It contains the new data to be written to a register by an instruction. 



This demultiplexing is controlled by the addrDestReg bus that contains the address of 
15 the destination register and the write enable signal. writeRegEn, By ANDing this write 
enable signal with the inverse of bit 5 of the destination register address creates a select 
for the GPRs. Each of the remaining bits of the destination register address are ANDed ; 
with this generated enable signal and if the enable is not set this will cause a write to j 
register 0 which does not store a value. 

20 

The Register Block is implemented just as registers. Note that these must maintain their 
value on every clock period. 

When an instruction addresses a register so as to use its contents, it puts a 6-bit 
25 address on one of two busses, to set this data up as operand 1 or operand 2 or both. All 
but the top bits of these two busses drive multiplexers that select the register value, as 
seen in Fig. 10 above. 

The Special Registers allow up to 32 addressable registers, where the first four are 
30 always present as they keep specific processor information. These four registers are the 
Reason Register at binary address 100000, the Interrupt Register at binary address 
100001, the Exception Address Register (EAR) at binary address 100010 and the Link 
Address Register (LAR) at binary address 10001 1. The bitfield definitions of these 4 
registers have been described above. 
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The Exception and Link Address Registers are just written to as normal through the 
pipeline and does not need any special logic around it. The Reason Register and the 
Interrupt Register, however, require further logic to handle resets and external interrupts 
5 as they happen. 

While those four registers are always inside the processor, the rest of the special 
registers can be either inside or outside it The placement of the registers is detennined 
by the parameterisation. 

10 

In the case of the registers being outside the processor, the outputs of the register bank 
go to the Special Registers module shown in Fig. 9 above. It is to supply the right 
addressed Special Register to the Register Muxes. 

15 This register holds two bits of infonnation. the enable bit and the pending bit. So. it must 
react to the resets and the intenrupt control signals. A rising edge must be detected on ^'^ 
extint the signal from the external interrupting source. It will set the pending bit. This 
must be maintained until either an internal acknowledge or a reset has been received. At 
this point an acknowledge must also be given back to the extemal interrupting source. 

20 The processor can read from this bit. but not write to it. 

On the other hand, the enable bit can be read from and written to by the processor. 
When this bit is set, and incoming interrupt will be serviced, otherwise it will be ignored. 

25 The Reason Register has to show the present state of the processor and cannot wait for 
the latency of data passing through the pipeline. It has to react to a hardware reset 
sysReset an illegal instruction sReset and either a Trap instmction or an Interrupt being 
serviced. 

30 The exception address register will keep the PC value at the clock cycle a TRAP or an 
interrupt changes the program order to be serviced. The instruction corresponding to this 
PC value is annulled, so its execution has to be restarted once the Exception Routine is 
finished. 
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EAR is written by the though the pipeline when a TRAP or an interrupt are detected and 
can be serviced. The address stored is read by the instruction RFE, which causes next 
PC to be loaded with it in order to fetch again the instruction previously annulled. 

5 The link address register is used when the execution of JAL or JALR will cause a 

change in the program order. They change the value of PC to the target address and the 
annulation of the instruction just been fetched. The address of that instruction is stored in 
LAR. 

10 LAR is written by the though the pipeline by JAL or JALR. Instruction RET will read the 
stored address, which will be loaded into next PC to restart the execution of the 
instmction previously annulled. 

There are only two operands required by the Execution Unit and either a General 
15 Purpose register or a Special register can be selected at any one time for each of these 
operands. The data returned to a specific register may be needed in the next clock 
period before it can be written back into the register, therefore, a forwarded path of 
dataBack is required. 

20 The case of a stall occurring also needs to be covered where the data in the previous 

clock period needs to be sent through again. The block diagram in Fig. 1 1 shows the two 
muxes that perform the selection of the data as the sources for the Execution Unit. The 
control for these muxes is handled in the Instruction Unit, where the instruction is 
decoded and thus the decision of what data to use is implemented. The data selected is 

25 then registered out of this pipe stage and into the Execution stage. 

The inputs regSI and regS2 are driven by the outputs of the General Purpose Registers, 
either being placed inside or outside. This fact is reflected in the block instantiation, 
where the actual signals are connected to the formal signals of the block. 

30 

The situation of the signals sRegSI and sRegS2 is different. They always come from the 
Special Registers block independently of part of the registers being inside or outside. 
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In essence, what has been produced is a generic processor, that is, a processor that can 
be adapted to the requirements of the job in hand. This generic processor is an outline 
design for a device that may be stored as a computer program on a record medium. In 
other words, the processor may be seen as a template from which further processors 
5 may be derived from. The general design is there and all the designer has to do is to 
input his specific requirements and generate a specific processor from the template. 
The designer may then go and realize the designed processor. This may be on a 
purpose built chip or the designer may realize the processor on a Field Programmable 
Gate Anray (FPGA). depending on his/her own requirements. 

10 

Some of the embodiments of the invention described with reference to the drawings 
comprise processes performed in computer apparatus. The invention also extends to 
computer programs, particulariy computer programs on or in a carrier adapted for putting 
the invention into practice. The code may be in source code, object code or a code 
15 intermediate source and object code or any other form suitable for use in the 
implementation of the methods according to the invention. 

The carrier may comprise a storage medium, for example, a ROM. CD or 
semiconductor, floppy disk or any other recording medium. Alternatively, the carrier may 
20 be a transmissible carrier such as an electrical or optical signal that may be conveyed by 
an electric or optical cable or any other means. When the program is embodied in a 
signal on such cables or other means, the carrier may be constituted by such means. 

The carrier may also be an integrated circuit in which the program is embedded, the 
25 integrated circuit being adapted for performing or for use in the pert'ormance of relevant 
methods. 

In the specification the terms "comprise, comprises, comprised and comprising" or any 
variation thereof and the terms "include, includes, included and including" or any 
30 variation thereof are considered to be totally interchangeable and they should all be 
afforded the widest possible interpretation and vice versa. 

The invention is not limited to the embodiment hereinbefore described, but may be varied in 
both construction and detail within the scope of the claims. 
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CLAIMS 

1 . A processor (1 ) having a number of components including at least: 
a configurable arithmetic and logic unit (4); 
a plurality of registers (3); 
memory access; and 
datapaths (5) between the components, 
characterised in that: 
1 5 the datapath width is of variable bit size namely n bits; 

the number of the components are selectable; 

where appropriate the components are of n bit size; and 

20 

each component is configured to handle data having one of two sizes 

< n or > n, 

25 2. A processor (1) as claimed in claim 1 comprising: 

means to select the number A and size of each component; 
means to select the datapath width; 

30 

means to configure the components for that datapath width; and 

means to compare the width of a data input to the selected 
datapath width that has been chosen for the component. 
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3. A processor (1) as claimed in claim 1 or 2 in NA^ich the immediate data of an 
instmction occupies a fixed size in memory and when the size is greater than n bits» 
the size is truncated to n bits. 

5 

4. A processor (1) as claimed in claim 1 or 2 in which the immediate data of an 
instmction is limited in size to a preset number of bits and when this number is less 
than n the immediate data is extended to n bits wide. 

10 5. A processor (1) as claimed in claim 4 in which there is provided means to determine 
whether the immediate data of an instruction is in signed or unsigned format. 

6. A processor (1) as claimed in claim 5 in which on determining that the immediate 
data of an instruction is in signed fomnat the immediate data is expanded to n bits 

15 wide with the vacant bits being populated with the most significant bit (MSB) of the 

immediate data. 

7. A processor (1) as claimed in claim 5 in which on determining that the immediate 
data of an instruction is in unsigned format the immediate data is expanded to n bits 

20 wide with the vacant bits being populated by zeros. 

8. A processor (1) as claimed in any preceding claim in which the processor has 
special purpose registers and general purpose registers (22). 

25 9. A processor (1) as claimed in claim 8 in which the general purpose registers are 
mounted extemal of the processor (1). 

10. A processor (1) as claimed in claim 8 or 9 in which one or more of the special 
purpose registers are mounted extemal of the processor (1). 

30 

11. A processor (1) as claimed in any of claims 8 to 10 in which the registers are 
configured to allow their content to be written to memory extemal of the processor 
(1). 
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12. A processor (1) as claimed in any of claims 8 to 11 in \Nh\ch the special registers 
may be written to external memory and used as general registers. 

13. A processor (1) as claimed in any of claims 8 to 12 in which all the general registers 
5 are n bits wide. 

14. A processor (1) as claimed In claim 13 in which the most significant bit (MSB) of a 
location in a register is the (x-l)** bit and the MSB of data of size n is the (n-l)*^ bit. 

10 15. A processor (1) as claimed in claim 14 in which means are provided for determining 
whether the n bit data is in signed or unsigned fomnat. 

16. A processor (1) as claimed in claim 15 in which the means are provided for writing 
the n bit data to an address of size x bits greater than n bits, the bits in positions 

15 between (x-1) and (n-1) inclusive are populated by the MSB of the n bit when the n 

bit data is in signed format. 

17. A processor (1) as claimed in claim 15 in which means are provided for writing the ri 
bit data to an address of size x bits greater than n bits, the bits in positions between 

20 (x-1) and n are populated by zeros when the n bit data is in unsigned format. 

18. A processor (1) as claimed in claim 15 in which means are provided for writing the n 
bit data to an address of size x bits less than n bits, the bits in positions between (n- 
1) and X inclusive are truncated. 

25 

19. A processor (1) as claimed in any preceding claim in which when it is required to 
perform logical operations on data in the high order address of one word with data 
in the lower order address of another word when n is an even number, means are 
provided to perform logical operations on the data in the top half of the word with 

30 data in the bottom half of the other word. 



20. 



A processor (1) as claimed in any of claims 1 to 19 in which when it is required to 
swap the upper order address of a word with the lower order address of that word 
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when X is an even number, means are provided to swap the top half of a word with 
data in the bottom half of the word. 

21. A processor (1) as claimed in any of claims 1 to 20 in which when it is required to 
5 perfomn logical operations on data of one word with data of another word and 

subsequently to swap the data in the upper half of the result with data in the lower 
half of the result and when x is an even number, means are provided to swap the 
data in the upper half of the result with the data in the lower half of the result. 

10 22. A processor (1) as claimed in any of claims 1 to 21 in which when it is required to 
perfomn logical operations on data in the high order address of one word with the 
data in the lower order address of another word when n is an uneven number 
means are provided to discard the central bit of each word and to perform logical 
operations on the data in the top half of the word with data in the bottom half of the 

15 other word. 

23. A processor (1) as claimed in any of claims 1 to 22 in which when it is required to 
swap the upper order address of a word with the lower order address of that word 
where n is an uneven number, means are provided to discard the central bit of the 

20 word and to swap the upper order address of the word with the lower order address 

of that word. 

24. A processor (1) as claimed in any of claims 1 to 23 in which when it is required to 
perform logical operations on two words of data and to subsequently swap the data 

25 in the high order address of the result with the data in the low order address of the 

result when x is an uneven number, means are provided to discard the central bit 
and swap the data in the high order address with the data in the low order address 
of the result 



30 25. 



A processor (1) as claimed in claims 19 to 24 in which the means to perform the 
swapping of data is provided by cross-wiring techniques. 
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A processor (1) as claimed in any preceding claim in which there is provided 
additional logic drcuitry for spedfic logical operations and an interface for 
communication with the additional logic circuitry. 

A processor (1) as daimed in daim 26 in which additional logic circuitry may be 
added subsequent to the realisation of the processor. 

A processor (1) as daimed in any preceding daim in which there are at least two 
processors sharing common general purpose registers. 

A processor (1) as claimed in any preceding daim in which there are at least two 
processors sharing common special purpose registers. 

A processor (1) as daimed in any preceding daim in which the processor is 
embodied in a software program. 

A processor (1) as daimed in daim 30 in which the processor embodied in a 
software program is stored on a record medium. 

A processor (1) as daimed in daim 30 in which the processor embodied in a 
software program is carried on an electrical carrier signal. 

A processor (1) having the structure of a processor as daimed in any preceding 
daim in which n is given a desired value. 

A processor (1) as claimed in daim 33 in which A is given a desired value. 

A processor (1) as claimed in daim 33 or 34 in which processor is embodied in a 
software program. 

A processor (1) as daimed in daim 35 in which the processor embodied in a 
software program is stored on a record medium. 
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A processor (1) as claimed in claim 35 in which the processor embodied in a 
software program is carried on an electrical canrier signal. 

A method of designing a generic processor comprising the steps of: 

preparing an outline processor in general architecture having a series of 
components described by blocks or the like interconnected by various 
datapaths having at least a configurable or arithmetic and logic unit (4). a 
plurality of registers (3), memory access and such other units and 
components as are required for a processor of the type being designed and 
then defining the datapath width of variable bit size, namely n bits. 

choosing A components where A is any number that could be chosen; 

defining the component size as n bit size; and 

programming each component to handle data having one of two sizes, 
namely 

<n or>n. 

A method of designing a processor as claimed in claim 38 in which the designer 
chooses an ari^itrary number of components greater than that that would ever be 
required. 

A method of designing a customised processor using the processor of claim 2 
comprising the steps of: 

selecting the datapath width; 

selecting the number and size of each component; 

configuring the components for that datapath width; and 
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configuring each component to handle data having one of two sizes, namely 

< n or > n. 

5 41. A computer program comprising program instructions for causing a computer to 
perform the method of claim 40. 

42. A computer program according to claim 41 embodied on a record medium. 

10 43. A computer program according to claim 41 stored in a computer memory. 

44. A computer program according to claim 41 embodied in a read-only memory, 

45. A computer program according to claim 41 carried on an electrical carrier signal. 

15 

46. A computer program according to claim 41 carried on an optical carrier signal. 
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ABSTRACT 
"A Data Processor"' 

5 A processor (1) of the type having a number of components including at least a 

configurable arithmetic and logic unit (4), a plurality of registers (3), memory access and 
datapaths (5), between the components. The datapath width is of variable bit size, namely, 
n bits, the number of components are selectable and where appropriate, are of n bit size 
and each component is configured to handle data having one of two sizes namely < n or > 
10 n. In essence, there is provided a generic processor that may be tailored to suit the specific 
tasks, space and computational requirements that have been determined by a designer. A 
method is also provided for designing such a processor. 
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